345 research outputs found

    New secondary curriculum: vision into practice - leadership case studies

    Get PDF

    A Mouth Full of Words: Visually Consistent Acoustic Redubbing

    Get PDF
    This paper introduces a method for automatic redubbing of video that exploits the many-to-many mapping of phoneme sequences to lip movements modelled as dynamic visemes [1]. For a given utterance, the corresponding dynamic viseme sequence is sampled to construct a graph of possible phoneme sequences that synchronize with the video. When composed with a pronunciation dictionary and language model, this produces a vast number of word sequences that are in sync with the original video, literally putting plausible words into the mouth of the speaker. We demonstrate that traditional, one-to-many, static visemes lack flexibility for this application as they produce significantly fewer word sequences. This work explores the natural ambiguity in visual speech and offers insight for automatic speech recognition and the importance of language modeling

    The Effect of Speaking Rate on Audio and Visual Speech

    Get PDF
    The speed that an utterance is spoken affects both the duration of the speech and the position of the articulators. Consequently, the sounds that are produced are modified, as are the position and appearance of the lips, teeth, tongue and other visible articulators. We describe an experiment designed to measure the effect of variable speaking rate on audio and visual speech by comparing sequences of phonemes and dynamic visemes appearing in the same sentences spoken at different speeds. We find that both audio and visual speech production are affected by varying the rate of speech, however, the effect is significantly more prominent in visual speech

    Hand Keypoint Detection in Single Images using Multiview Bootstrapping

    Full text link
    We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand. We call this procedure multiview bootstrapping: first, an initial keypoint detector is used to produce noisy labels in multiple views of the hand. The noisy detections are then triangulated in 3D using multiview geometry or marked as outliers. Finally, the reprojected triangulations are used as new labeled training data to improve the detector. We repeat this process, generating more labeled data in each iteration. We derive a result analytically relating the minimum number of views to achieve target true and false positive rates for a given detector. The method is used to train a hand keypoint detector for single images. The resulting keypoint detector runs in realtime on RGB images and has accuracy comparable to methods that use depth sensors. The single view detector, triangulated over multiple views, enables 3D markerless hand motion capture with complex object interactions.Comment: CVPR 201

    Joint Learning of Facial Expression and Head Pose from Speech

    Get PDF

    Predicting Head Pose from Speech with a Conditional Variational Autoencoder

    Get PDF
    Natural movement plays a significant role in realistic speech animation. Numerous studies have demonstrated the contribution visual cues make to the degree we, as human observers, find an animation acceptable. Rigid head motion is one visual mode that universally co-occurs with speech, and so it is a reasonable strategy to seek a transformation from the speech mode to predict the head pose. Several previous authors have shown that prediction is possible, but experiments are typically confined to rigidly produced dialogue. Natural, expressive, emotive and prosodic speech exhibit motion patterns that are far more difficult to predict with considerable variation in expected head pose. Recently, Long Short Term Memory (LSTM) networks have become an important tool for modelling speech and natural language tasks. We employ Deep Bi-Directional LSTMs (BLSTM) capable of learning long-term structure in language, to model the relationship that speech has with rigid head motion. We then extend our model by conditioning with prior motion. Finally, we introduce a generative head motion model, conditioned on audio features using a Conditional Variational Autoencoder (CVAE). Each approach mitigates the problems of the one to many mapping that a speech to head pose model must accommodat

    Audio-to-Visual Speech Conversion using Deep Neural Networks

    Get PDF
    We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results

    ADEnosine testing to determine the need for Pacing Therapy with the additional use of an Implantable Loop Recorder (ADEPT-ILR)

    Get PDF
    MD ThesisAim: To determine the efficacy of permanent pacing in preventing syncopal episodes in patients with unexplained syncope and a positive adenosine test via a randomised double-blind placebo-controlled crossover trial with an accompanying negative adenosine test implantable loop recorder arm. Methods: Individuals presenting to secondary care with unexplained syncope underwent adenosine testing as defined by the European Society of Cardiology. Those with a positive test had a permanent pacemaker implant and were randomised to pacemaker on or off for 6 months before crossing over to the alternative mode. Those with a negative adenosine test underwent a loop recorder implantation. The primary outcome was cumulative syncope burden as reported by monthly syncope diaries. Results: Fifty-two patients were included in the trial and had adenosine testing. There were 35 positive adenosine tests (67%) and 17 negative adenosine tests (33%). There was a mean of 0.4 fewer syncopal episodes per patient during the pacemaker on period compared to the pacemaker off period (1.2 vs. 1.6 episodes) with a higher relative risk of syncope in the pacemaker off period compared with the pacemaker on (RR 2.1, 95% CI 1.0 to 4.4, p=0.048). In the adenosine negative arm, one patient developed bradycardia requiring permanent pacing, giving a negative predictive value of the adenosine test for identifying a bradycardia pacing indication of 0.94 (95% CI 0.69 to 1.0). Conclusion: Permanent pacing reduces the syncope burden in patients with unexplained syncope and a positive adenosine test, whilst a high negative predictive value demonstrates the low likelihood of a missed opportunity for pacemaker implantation. Our study suggests that a positive adenosine test unmasks bradycardia pacing indications without the need for prolonged and invasive investigations, providing opportunity for early and effective intervention

    Environment, alcohol intoxication and overconfidence: evidence from a lab-in-the-field experiment

    Get PDF
    Alcohol has long been known as the demon drink; an epithet owed to the numerous social ills it is associated with. Our lab-in-the-field experiment assesses the extent to which changes in intoxication and an individual's environment lead to changes in overconfidence or cognitive ability that are, in turn, often linked to problematic behaviours. Results indicate that it is the joint effect of being intoxicated in a bar, rather than simply being intoxicated, that matters. Subjects systematically underestimated the magnitude of their behavioural changes, suggesting that they cannot be held fully accountable for their actions
    • …
    corecore